首页> 外文OA文献 >Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis, and beyond
【2h】

Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis, and beyond

机译:固态硬盘会加速您的生物信息学吗?深入   分析,性能分析等

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A wide variety of large-scale data has been produced in bioinformatics. Inresponse, the need for efficient handling of biomedical big data has beenpartly met by parallel computing. However, the time demand of manybioinformatics programs still remains high for large-scale practical uses dueto factors that hinder acceleration by parallelization. Recently, newgenerations of storage devices have emerged, such as NAND flash-basedsolid-state drives (SSDs), and with the renewed interest in near-dataprocessing, they are increasingly becoming acceleration methods that canaccompany parallel processing. In certain cases, a simple drop-in replacementof hard disk drives (HDDs) by SSDs results in dramatic speedup. Despite thevarious advantages and continuous cost reduction of SSDs, there has been littlereview of SSD-based profiling and performance exploration of important buttime-consuming bioinformatics programs. For an informative review, we performin-depth profiling and analysis of 23 key bioinformatics programs usingmultiple types of devices. Based on the insight we obtain from this research,we further discuss issues related to design and optimize bioinformaticsalgorithms and pipelines to fully exploit SSDs. The programs we profile covertraditional and emerging areas of importance, such as alignment, assembly,mapping, expression analysis, variant calling, and metagenomics. We explain howacceleration by parallelization can be combined with SSDs for improvedperformance and also how using SSDs can expedite important bioinformaticspipelines, such as variant calling by the Genome Analysis Toolkit (GATK) andtranscriptome analysis using RNA sequencing (RNA-seq). We hope that this reviewcan provide useful directions and tips to accompany future bioinformaticsalgorithm design procedures that properly consider new generations of powerfulstorage devices.
机译:生物信息学已经产生了各种各样的大规模数据。相应地,并行计算已部分满足了有效处理生物医学大数据的需求。然而,由于阻碍并行化加速的因素,许多生物信息学程序对于大规模实际应用的时间需求仍然很高。最近,出现了新一代的存储设备,例如基于NAND闪存的固态驱动器(SSD),并且随着对近数据处理的重新关注,它们越来越成为可以并行处理的加速方法。在某些情况下,简单地将固态硬盘替换为硬盘驱动器(HDD)即可显着提高速度。尽管SSD具有各种优势和不断降低的成本,但对基于SSD的性能分析和重要但耗时的生物信息学程序的性能探索却鲜有评论。为了提供有益的评论,我们使用多种设备对23个关键的生物信息学程序进行了深入的分析和分析。基于我们从这项研究中获得的见解,我们将进一步讨论与设计和优化生物信息学算法和管道以充分利用SSD相关的问题。我们介绍的程序涵盖了传统的和新兴的重要领域,例如比对,组装,映射,表达分析,变体调用和宏基因组学。我们将说明如何通过并行化将加速与SSD结合使用以提高性能,以及如何使用SSD加速重要的生物信息学流程,例如基因组分析工具包(GATK)的变体调用和使用RNA测序(RNA-seq)的转录组分析。我们希望这篇评论能够提供有用的指导和技巧,以配合未来的生物信息学算法设计程序,这些程序会适当考虑新一代强大的存储设备。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号